Search CORE

815 research outputs found

Document Clustering with K-tree

Author: De Vries Christopher M.
Geva Shlomo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

K-tree: Large Scale Document Clustering

Author: De Vries Christopher M.
Geva Shlomo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

Random Indexing K-tree

Author: De Vine Lance
De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2009
Field of study

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Molecular Line Profile Fitting with Analytic Radiative Transfer Models

Author: Adelson L. M.
Bacmann A.
Christopher H. De Vries
Hogerheijde M. R.
Kramer C.
Monteiro T. S.
Nelder J. A.
Philip C. Myers
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2004
Field of study

We present a study of analytic models of starless cores whose line profiles have ``infall asymmetry,'' or blue-skewed shapes indicative of contracting motions. We compare the ability of two types of analytical radiative transfer models to reproduce the line profiles and infall speeds of centrally condensed starless cores whose infall speeds are spatially constant and range between 0 and 0.2 km s-1. The model line profiles of HCO+ (J=1-0) and HCO+ (J=3-2) are produced by a self-consistent Monte Carlo radiative transfer code. The analytic models assume that the excitation temperature in the front of the cloud is either constant (``two-layer'' model) or increases inward as a linear function of optical depth (``hill'' model). Each analytic model is matched to the line profile by rapid least-squares fitting. The blue-asymmetric line profiles with two peaks, or with a blue shifted peak and a red shifted shoulder, can be well fit by the ``HILL5'' model (a five parameter version of the hill model), with an RMS error of 0.02 km s-1. A peak signal to noise ratio of at least 30 in the molecular line observations is required for performing these analytic radiative transfer fits to the line profiles.Comment: 48 pages, 20 figures, accepted for publication in Ap

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

The Spitzer c2d Survey Of Nearby Dense Cores. XI. Infrared And Submillimeter Observations Of CB130

Author: Bourke Tyler L.
Chen Jo-Hsin
De Vries Christopher
Dunham Michael M.
Evans Neal J.
Huard Tracy L.
Kim Hyo Jeong
Lee Jeong-Eun
Shirley Yancy L.
Publication venue
Publication date: 05/01/2011
Field of study

We present new observations of the CB130 region composed of three separate cores. Using the Spitzer Space Telescope, we detected a Class 0 and a Class II object in one of these, CB130-1. The observed photometric data from Spitzer and ground-based telescopes are used to establish the physical parameters of the Class 0 object. Spectral energy distribution fitting with a radiative transfer model shows that the luminosity of the Class 0 object is 0.14-0.16 L-circle dot, which is low for a protostellar object. In order to constrain the chemical characteristics of the core having the low-luminosity object, we compare our molecular line observations to models of lines including abundance variations. We tested both ad hoc step function abundance models and a series of self-consistent chemical evolution models. In the chemical evolution models, we consider a continuous accretion model and an episodic accretion model to explore how variable luminosity affects the chemistry. The step function abundance models can match observed lines reasonably well. The best-fitting chemical evolution model requires episodic accretion and the formation of CO2 ice from CO ice during the low-luminosity periods. This process removes C from the gas phase, providing a much improved fit to the observed gas-phase molecular lines and the CO2 ice absorption feature. Based on the chemical model result, the low luminosity of CB130-1 is explained better as a quiescent stage between episodic accretion bursts rather than being at the first hydrostatic core stage.NASA 1224608, 1288664, 1407, NNX07AJ72G, 1279198, 1288806, 1342425NSF AST-0607793, AST-0708158Korea government (MEST) 2009-0062866Ministry of Education, Science and Technology 2010-0008704Astronom

arXiv.org e-Print Archive

Texas ScholarWorks

The Spitzer c2d Survey of Nearby Dense Cores: VI. The Protostars of Lynds Dark Nebula 1221

Author: André
Anglada
Bourke
Caselli
Chadwick H. Young
Chandler
Chen
Chiang
Christopher De Vries
Curiel
Curiel
Dunham
Dunham
Evans
Fazio
Harvey
Harvey
Ivezić
Jes K. Jørgensen
Jørgensen
Kaisa E. Young
Lada
Lee
Lee
Lee
Loinard
Mark J. Claussen
Michael M. Dunham
Neal J. Evans
Noriega-Crespo
Ossenkopf
Rieke
Rodríguez
Shang
Shirley
Shirley
Tyler L. Bourke
Victor Popa
Werner
Wu
Yancy L. Shirley
Yonekura
Young
Young
Young
Young
Publication venue: 'IOP Publishing'
Publication date: 09/07/2009
Field of study

Observations of Lynds Dark Nebula 1221 from the Spitzer Space Telescope are presented. These data show three candidate protostars towards L1221, only two of which were previously known. The infrared observations also show signatures of outflowing material, an interpretation which is also supported by radio observations with the Very Large Array. In addition, molecular line maps from the Five College Radio Astronomy Observatory are shown. One-dimensional dust continuum modelling of two of these protostars, IRS1 and IRS3, is described. These models show two distinctly different protostars forming in very similar environments. IRS1 shows a higher luminosity and larger inner radius of the envelope than IRS3. The disparity could be caused by a difference in age or mass, orientation of outflow cavities, or the impact of a binary in the IRS1 core.Comment: accepted for publication in Ap

arXiv.org e-Print Archive

Crossref

Texas ScholarWorks

Gene expression profiling of epithelium-associated FcRL4(+) B cells in primary Sjogren's syndrome reveals a pathogenic signature

Author: Bootsma Hendrika
de Lange Kim
Haacke Erlin A
Hickey Peter
Ice John A
Kroese Frans G M
Lessard Christopher J
Pringle Sarah
Spijkervet Frederik K L
van der Vries Gerben B
Verstappen Gwenny M
Vissink Arjan
Publication venue: 'Elsevier BV'
Publication date: 01/05/2020
Field of study

In primary Sjögren's syndrome (pSS), FcRL4+ B cells are present in inflamed salivary gland tissue, within or in close proximity to ductal epithelium. FcRL4 is also expressed by nearly all pSS-related mucosa-associated lymphoid tissue (MALT) B cell lymphomas, linking FcRL4 expression to lymphomagenesis. Whether glandular FcRL4+ B cells are pathogenic, how these cells originate, and how they functionally differ from FcRL4- B cells in pSS is unclear. This study aimed to investigate the phenotype and function of FcRL4+ B cells in the periphery and parotid gland tissue of patients with pSS. First, circulating FcRL4+ B cells from 44 pSS and 54 non-SS-sicca patients were analyzed by flow cytometry. Additionally, RNA sequencing of FcRL4+ B cells sorted from parotid gland cell suspensions of 6 pSS patients was performed. B cells were sorted from cell suspensions as mini bulk (5 cells/well) based on the following definitions: CD19+CD27-FcRL4- ('naive'), CD19+CD27+FcRL4- ('memory'), and CD19+FcRL4+ B cells. We found that, although FcRL4+ B cells were not enriched in blood in pSS compared with non-SS sicca patients, these cells generally exhibited a pro-inflammatory phenotype. Genes coding for CD11c (ITGAX), T-bet (TBX21), TACI (TNFRSF13B), Src tyrosine kinases and NF-κB pathway-related genes were, among others, significantly upregulated in glandular FcRL4+ B cells versus FcRL4- B cells. Pathway analysis showed upregulation of B cell activation, cell cycle and metabolic pathways. Thus, FcRL4+ B cells in pSS exhibit many characteristics of chronically activated, pro-inflammatory B cells and their gene expression profile suggests increased risk of lymphomagenesis. We postulate that these cells contribute significantly to the epithelial damage seen in the glandular tissue and that FcRL4+ B cells are an important treatment target in pSS

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Advanced model compounds for understanding acid-catalyzed lignin depolymerization : identification of renewable aromatics and a lignin-derived solvent

Author: Alexandra M. Z. Slawin
Christopher S. Lancefield
Ciaran W. Lahive
Claire M. Young
David B. Cordes
Fanny Tran
Johannes G. de Vries
Katalin Barta
Lundquist K.
Lundquist K.
Nicholas J. Westwood
Paul C. J. Kamer
Peter J. Deuss
Tanahashi M.
Zhuohua Sun
Publication venue: 'American Chemical Society (ACS)'
Publication date: 20/07/2016
Field of study

This work was funded by the EP/J018139/1, EP/K00445X/1 grants (NJW and PCJK), an EPSRC Doctoral Prize Fellowship (CSL), and the European Union (Marie Curie ITN ‘SuBiCat’ PITN-GA-2013-607044, CWL, NJW, PCJK, PJD, KB, JdeV).The development of fundamentally new approaches for lignin depolymerization is challenged by the complexity of this aromatic biopolymer. While overly simplified model compounds often lack relevance to the chemistry of lignin, the direct use of lignin streams poses significant analytical challenges to methodology development. Ideally, new methods should be tested on model compounds that are complex enough to mirror the structural diversity in lignin but still of sufficiently low molecular weight to enable facile analysis. In this contribution, we present a new class of advanced (β-O-4)-(β-5) dilinkage models that are highly realistic representations of a lignin fragment. Together with selected β-O-4, β-5, and β–β structures, these compounds provide a detailed understanding of the reactivity of various types of lignin linkages in acid catalysis in conjunction with stabilization of reactive intermediates using ethylene glycol. The use of these new models has allowed for identification of novel reaction pathways and intermediates and led to the characterization of new dimeric products in subsequent lignin depolymerization studies. The excellent correlation between model and lignin experiments highlights the relevance of this new class of model compounds for broader use in catalysis studies. Only by understanding the reactivity of the linkages in lignin at this level of detail can fully optimized lignin depolymerization strategies be developed.PostprintPeer reviewe

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

Dissertations of the University of Groningen

Toxicity Weighting for Human Biomonitoring Mixture Risk Assessment: A Proof of Concept

Author: Christopher de Vries Yvette
Kolossa-Gehring Marike
Lebret Erik
Loh Miranda M
Luijten Mirjam
Schmidt Phillipp
Vlaanderen Jelle
Vogel Nina
Publication venue
Publication date: 01/05/2023
Field of study

Chemical mixture risk assessment has, in the past, primarily focused on exposures quantified in the external environment. Assessing health risks using human biomonitoring (HBM) data provides information on the internal concentration, from which a dose can be derived, of chemicals to which human populations are exposed. This study describes a proof of concept for conducting mixture risk assessment with HBM data, using the population-representative German Environmental Survey (GerES) V as a case study. We first attempted to identify groups of correlated biomarkers (also known as 'communities', reflecting co-occurrence patterns of chemicals) using a network analysis approach ( n = 515 individuals) on 51 chemical substances in urine. The underlying question is whether the combined body burden of multiple chemicals is of potential health concern. If so, subsequent questions are which chemicals and which co-occurrence patterns are driving the potential health risks. To address this, a biomonitoring hazard index was developed by summing over hazard quotients, where each biomarker concentration was weighted (divided) by the associated HBM health-based guidance value (HBM-HBGV, HBM value or equivalent). Altogether, for 17 out of the 51 substances, health-based guidance values were available. If the hazard index was higher than 1, then the community was considered of potential health concern and should be evaluated further. Overall, seven communities were identified in the GerES V data. Of the five mixture communities where a hazard index was calculated, the highest hazard community contained N-Acetyl-S-(2-carbamoyl-ethyl)cysteine (AAMA), but this was the only biomarker for which a guidance value was available. Of the other four communities, one included the phthalate metabolites mono-isobutyl phthalate (MiBP) and mono-n-butyl phthalate (MnBP) with high hazard quotients, which led to hazard indices that exceed the value of one in 5.8% of the participants included in the GerES V study. This biological index method can put forward communities of co-occurrence patterns of chemicals on a population level that need further assessment in toxicology or health effects studies. Future mixture risk assessment using HBM data will benefit from additional HBM health-based guidance values based on population studies. Additionally, accounting for different biomonitoring matrices would provide a wider range of exposures. Future hazard index analyses could also take a common mode of action approach, rather than the more agnostic and non-specific approach we have taken in this proof of concept

Utrecht University Repository